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ABSTRACT 

Automated classification of human anatomy is an important 
prerequisite for many computer-aided diagnosis systems. The 
spatial complexity and variability of anatomy throughout the 
human body makes classification difficult. “Deep learning” 
methods such as convolutional networks (ConvNets) outper¬ 
form other state-of-the-art methods in image classification 
tasks. In this work, we present a method for organ- or body- 
part-specific anatomical classification of medical images ac¬ 
quired using computed tomography (CT) with ConvNets. We 
train a ConvNet, using 4,298 separate axial 2D key-images to 
learn 5 anatomical classes. Key-images were mined from a 
hospital PACS archive, using a set of 1,675 patients. We show 
that a data augmentation approach can help to enrich the data 
set and improve classification performance. Using Conv¬ 
Nets and data augmentation, we achieve anatomy-specific 
classification error of 5.9 % and area-under-the-curve (AUC) 
values of an average of 0.998 in testing. We demonstrate 
that deep learning can be used to train very reliable and ac¬ 
curate classifiers that could initialize further computer-aided 
diagnosis. 

Index Terms — Image Classification, Computed tomo¬ 
graphy (CT), Convolutional Networks, Deep Learning 

1. INTRODUCTION 

Medical image classification can be an important component 
of many computer aided detection (CADe) and diagnosis 
(CADx) systems. Achieving high accuracies for automated 
classification of anatomy is a challenging task, given the vast 
scope of anatomic variation. In this work, our aim is to auto¬ 
matically classify axial CT images into 5 anatomical classes 
(see Fig. [TJ. This aim is achieved by mining radiological re¬ 
ports that refer to key-images and associated DICOM image 
tags manually in order to establish a ground truth for train¬ 
ing and testing. Using computer vision and medical image 
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computing techniques, we were able to train the computer to 
replicate these classes with low error rates. 



Fig. 1. Example key-images of 5 classes of anatomy in our 
data set: neck, lungs, liver, pelvis and legs. 


2. METHOD 

Recently, the availability of large annotated training sets and 
the accessibility of affordable parallel computing resources 
via GPUs have made it feasible to train “deep” convolutional 
networks (ConvNets). ConvNets have popularized the topic 
of “deep learning” in computer vision research (l). Through 
the use of ConvNets, not only have great advances been made 
in the classification of natural images [0, but substantial ad¬ 
vancements have also been made in biomedical applications, 
such as digital pathology 0. Additionally, recent work has 
shown how the implementation of ConvNets can substantially 
improve the performance of state-of-the-art CADe systems 

GD00G). 

2.1. Convolutional networks 

In this work, we apply ConvNets to build an anatomy-specific 
classifier for CT images. ConvNets are named for their con¬ 
volutional filters which are used to compute image features 




for classification. In this work, we use 5 cascaded layers of 
convolutional filters. All convolutional filter kernel elements 
are trained from the data in a supervised fashion. This has 
major advantages over more traditional CAD approaches that 
use hand-crafted features, designed from human experience. 
This means that ConvNets have a better chance of capturing 
the “essence” of the imaging data set used for training than 
when using hand-crafted features d. Examples of trained 
filters of the first convolutional layer can be seen in Fig. [2 
These first-layer filters capture low spatial frequency signals. 
In contrast, a mixed set of low and high frequency patterns 
exists in the first convolutional layer shown in 01 El- This 
indicates that the essential information of this task of classify¬ 
ing holistic slice-based body regions lies in the low frequency 
spatial intensity contrasts. These automatically learned low 
frequency filters need no tuning by hand, which is different 
from using intensity histograms, e.g. (H |9). In-between 



Fig. 2. The first layer of learned convolutional kernels of a 
ConvNet trained on medical CT images. 

convolutional layers, the ConvNet performs max-pooling 
operations in order to summarize feature responses across 
non-overlapping neighboring pixels (see Fig. [3}. This allows 
the ConvNet to learn features that are invariant to spatial 
variations of objects in the images. Feature responses after 
the 5th convolutional layer feed into a fully-connected neural 
network. This network learns how to interpret the feature 
responses and make anatomy-specific classifications. Our 
ConvNet uses a final softmax layer which provides a prob¬ 
ability for each object class (see Fig. 0. In order to avoid 
overfitting, the fully-connected layers are constrained, using 
the “DropOut” method d. DropOut behaves as a regular- 
izer when training the ConvNet by preventing co-adaptation 
of units in the neural network. We use an open-source im¬ 
plementation ( cuda-convnet 43) by Krizhevsky et al. mm 
which efficiently trains the ConvNet, using GPU acceleration. 
Further speed-ups are achieved using rectified linear units as 
neuron activation function instead of the traditional neuron 
model f(x) = tanh(x) or f(x ) = (1 + in both 

training and evaluation (2). 

2.2. Data mining of key-images 

We retrieve medical images (many related to liver disease) 
from the Picture Archiving and Communication System 
(PACS) of the Clinical Center of the National Institutes of 

! https://code.google.com/p/cuda-convnet2 
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Fig. 3. ConvNet applied to an axial CT image. The number of 
convolutional filters and neural network connections for each 
layer are as shown. 


Health by searching for a set of keywords in the radiological 
reports. Then, each image is assigned a ground truth label 
based on the 'StudyDescription’ and ‘BodyPartExamined’ 
DICOM tags (manually corrected if necessary). This results 
in 5 classes of images as shown in Fig. U Images which show 
anatomies of multiple classes at once are duplicated and each 
image copy is assigned one of the class labels. This case 
commonly occurs at the transition region between lung and 
liver. Our ConvNet assigns equal probabilities for each class 
in these regions. 


2.3. Data augmentation 

We enrich our data set by applying spatial deformations to 
each image, using random translation, rotations and non-rigid 
deformations. Each non-rigid training deformation t is com¬ 
puted by fitting a thin-plate-spline (TPS) to a regular grid of 
2D control points {c up, i = 1, 2,, K}. These control points 
can be randomly transformed at the 2D slice level and a de¬ 
formed image can be generated using a radial basis function 
<P(r)\ 

K 

t{x) = yy j c i (t>{\\x - uji\\). (i) 

i =1 

We use (j){r) = r 2 log(r) which is commonly applied for TPS. 
A typical TPS deformation field and deformed variations of 
an example image grid are shown in Fig. [4] The variation 
of translation t, rotation r and non-rigid deformations d are a 
useful way to increase the variety and sample space of avail¬ 
able training data, resulting in 7V aug . = N x N t x N r x 
Nd variations of the imaging data. The maximum amounts 
of translation, rotation and non-rigid deformation are chosen 
such that the resulting deformations resemble plausible phys¬ 
ical variations of the medical images. This approach is com¬ 
monly referred to as data augmentation and can help avoid 
overfitting (2f Our set of 7V aug . axial images are then rescaled 
to 256 x 256 and used to train a ConvNet with a standard ar¬ 
chitecture for multi-class image classification (as described in 
Sec. Of 
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Fig. 4. Data augmentation using varying random transforma¬ 
tions, rotations and non-rigid deformations using thin-plate- 
spline (TPS) interpolations on an example image grid. 


legs 

90 

0 

0 

0 

0 

pelvis 

0 

24 

2 

0 

1 

liver 

0 

6 

484 

42 

0 

lungs 

0 

0 

28 

93 

5 

neck 

0 

0 

0 

0 

102 


error 9.6% 


legs 

90 

0 

0 

0 

0 

pelvis 

0 

27 

0 

0 

0 

liver 

0 

0 

518 

14 

0 

lungs 

0 

0 

38 

88 

0 

neck 

0 

0 

0 

0 

102 


error 5.9% 


Fig. 5. Confusion matrices on the original test images before 1 
and after 2 data augmentation. 


3. RESULTS 
3.1. Key-image data set 

We use 80 % of our total dataset for training a multi-class 
ConvNet as described in Sec. irn and reserve 20 % for test¬ 
ing purposes. Our data augmentation step (see Sec 12.3b in¬ 
creases the amount of training and testing data drastically, as 
shown in Table 13.11 The number of deformations for each 
anatomical class is chosen so that the resulting augmented 
images build a more balanced and enriched data set. We use 
N t = 2 and N r = 2 while adjusting Nj for each class to 
achieve a balanced data set. Table 13. II further shows that data 
augmentation helps to reduce classification errors from 9.6 
% to 5.9 % in testing and furthermore improve the average 
area-under-the-curve (AUC) values from 0.994 to 0.998 us¬ 
ing receiver-operating-characteristic (ROC) analysis. Confu¬ 
sion matrices shown in Fig. [5] show a clear reduction of mis- 
classification after using data augmentation when testing on 
the original test set. We further illustrate the feature space of 
our trained ConvNet using t-SNE lfl2lfl3ll in Fig. [6[ A clear 
separation of most classes can be observed. An overlapping 
cluster can be seen at the interface between the lungs and liver 
images. This is caused by key-images that show both lungs 
and livers being near the diaphragm region. 


Table 1. Image data set before 1 and after 2 data augmentation. 
An improvement of both error rate and AUC values can be 
achieved by using data augmentation. 


Organ 

#! 

# 2 

AUC 1 

AUC 2 

leg 

All 

24,804 

1.000 

1.000 

pelvis 

104 

22,048 

0.996 

1.000 

liver 

2,684 

32,208 

0.994 

0.999 

lung 

590 

25,960 

0.981 

0.999 

neck 

443 

23,036 

0.999 

1.000 

Sum/Mean AUC 

4,298 

12,8056 

0.994 

0.998 

Error 

9 . 6 % 

5 . 9 % 




3.2. Full torso CT volume 

For qualitative evaluation, we also apply our trained ConvNet 
classifier on a full torso CT examination on a slice-by-slice 
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Fig. 6. 2D embedding of ConvNet features using t-SNE on 
a subset of test images. Each dot represents a key-image in 
feature space. The color-coding is based on the ground truth 
label for each key-image. 

basis (dimensions of [512, 512, 652] and [0.98, 0.98,1.5] mm 
voxel spacing). The resulting anatomy-specific probabilities 
for each slice are plotted as profiles next to the coronal slice 
of the CT volume in Fig. [71 Note how the interface between 
the lungs and liver at the level of the diaphragm is captured 
by roughly equal probabilities of the ConvNet. This classi¬ 
fication result is achieved in less than 1 minute on a mod¬ 
ern desktop computer and GPU card (Dell Precision T7500, 
24GB RAM, NVIDIA Titan Z). 

4. DISCUSSION 

This work demonstrates how deep ConvNets can be applied to 
effective anatomy-specific classification of medical images. 
Similar motives to ours are explored in content-based image 
retrieval methods d. However, association based on clin¬ 
ical reports and image scans can be very loose. This makes 
retrieval based on clinical reports difficult. In this paper, we 
focus on manually labeled key-images that allow us to train an 
anatomy-specific classifier. Other related work includes the 
ImageCLEF medical image annotation tasks of 2005-2007. 
However, these tasks used highly subsampled 2D version of 
medical images (32 x 32 pixels) C3- Methods applied to the 
ImageCLEF tasks included using local image descriptors and 
intensity histograms in a bag-of-features approach [161. We 
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Fig. 7. Organ-specific probabilities for a whole-body CT 
scan. 


concentrate on classifying images much closer to their origi¬ 
nal 512 x 512 resolution, namely rescaled to 256 x 256. We 
show that ConvNets can model this higher detail in the im¬ 
ages and generalize well to large variations found in medical 
imaging data with promising quantitative and qualitative re¬ 
sults. Some axial slices in the lower abdomen had erroneously 
high probabilities for lung or legs. Here, it could be benefi¬ 
cial to introduce an additional class of ‘lower abdomen’. Our 
method could be easily extended to include further augmen¬ 
tation such as image scales in order to model variations in 
patient sizes. This type of anatomy classifier could be em¬ 
ployed as an initialization step for further and more detailed 
analysis, such as disease and organ specific computer-aided 
detection and/or diagnosis. 
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